Linearizability Violations in Cassandra

Let's examine two different scenarios where Cassandra violates the linearizability

We'll cover the following

It is important to note that operations on a single row are not linearizable by default, even when using majority quorums.

To understand why it is useful to learn how Cassandra handles partial failures and network delays. Let’s examine two different scenarios.

Without read repair#

In this scenario, we assume that read repair is not used. The system consists of three different replicas with a single row that contains a single column owner with the value none.

Client A initially performs a write operation to set owner = A. While this operation is in progress, two clients B and C perform a read operation for the owner in sequence. The majority quorum of client B contains one replica that has already received the write operation, while client C contacts a quorum with nodes that haven’t received it yet. As a result, client B reads owner = A, while client C reads owner = none even though the operation from the latter started after the operation from the former had been completed, which violates linearizability.

The following illustration shows this phenomenon:

Created with Fabric.js 3.6.6
A system with three Client and three Replica nodes, initially all the Replica nodes has owner value equals to none

1 of 19

Created with Fabric.js 3.6.6
Client A sends a write request to set owner equals A

2 of 19

Created with Fabric.js 3.6.6
The write request for setting owner equals A reaches the Replica 1, so it updates its owner to A

3 of 19

Created with Fabric.js 3.6.6
While the write operation is in progress, Client B sends a read request for the owner

4 of 19

Created with Fabric.js 3.6.6
he read request from client B reaches replica 3, which will also ask replica 1 before replying to the client to form a majority quorum

5 of 19

Created with Fabric.js 3.6.6
Replica 1 forwards the write request to Replica 2

6 of 19

Created with Fabric.js 3.6.6
Replica 3 asks replica 1 for the value of owner

7 of 19

Created with Fabric.js 3.6.6
The read request reaches replica 1, which returns value A

8 of 19

Created with Fabric.js 3.6.6
Between values none and A, replica chooses value A which has a later timestamp. However, it does not update its local state as there’s no read repair.

9 of 19

Created with Fabric.js 3.6.6
Client B in response to its read request receives the owner value equals A

10 of 19

Created with Fabric.js 3.6.6
The write operation is still in progress, right after the Client B performed the read request, Client C also sends a read request for the owner

11 of 19

Created with Fabric.js 3.6.6
The read request from client C reaches replica 3, which will also ask replica 2 before replying to form a majority quorum

12 of 19

Created with Fabric.js 3.6.6
Replica 3 sends the read request to replica 2

13 of 19

Created with Fabric.js 3.6.6
The read request reaches replica 2, which returns none as value

14 of 19

Created with Fabric.js 3.6.6
Replica 3 receives the response to the read request, which is none and matches its own

15 of 19

Created with Fabric.js 3.6.6
Client receives none as the response to its read request

16 of 19

Created with Fabric.js 3.6.6
The write request for setting owner value to A reaches the Replica 2, so it updates its owner to A

17 of 19

Created with Fabric.js 3.6.6
Replica 2 after performing write operation returns to the Replica 1 from which it received the write request

18 of 19

Created with Fabric.js 3.6.6
Replica 1 returns to the write response after receiving acknowledgement from replica 2 that the write is complete

19 of 19

With read repair#

The violation of linearizability in the previous example would be eliminated if read repair was used since the read from client B would propagate the value to replica 2, and client C would also read owner = A.

So, let’s assume that read repair is used and examine a different scenario. Client A performs a write operation again to set owner = A. The write succeeds in one replica and fails in the other replica. As a result, the write is considered unsuccessful, and the coordinator returns a failure response back to the client. Afterward, client B performs a read operation that uses a quorum that contains the replica where the previous write succeeded.

Cassandra performs a read repair using the LWW strategy, thus propagating the value to replica 2. Consequently, a write operation that failed has affected the state of the database, thus violating linearizability. This example is shown in the following illustration:

Created with Fabric.js 3.6.6
A system with three Client and three Replica nodes, initially all the Replica nodes has owner value equals to none

1 of 24

Created with Fabric.js 3.6.6
Client A sends a write request to set owner equals A

2 of 24

Created with Fabric.js 3.6.6
While the write operation is in progress, Client B sends a read request for the owner

3 of 24

Created with Fabric.js 3.6.6
The write request for setting owner equals A reaches the Replica 1, so it updates its owner to A

4 of 24

Created with Fabric.js 3.6.6
The read request from client B reaches replica 3, which will also ask replica 1 before replying to client to form a majority quorum

5 of 24

Created with Fabric.js 3.6.6
Replica 3 asks replica 1 for the value of owner

6 of 24

Created with Fabric.js 3.6.6
Replica 1 forwards the write request to Replica 2

7 of 24

Created with Fabric.js 3.6.6
The read request reaches replica 1, which returns value A

8 of 24

Created with Fabric.js 3.6.6
Since the Replica 1 has received and performed the write operation, so it replies to the read request from Client B with the updated owner value A

9 of 24

Created with Fabric.js 3.6.6
Replica 3 receives the response to the read request from replica with value A. Between A and none, replica 3 keeps A that has the highest timestamp and also updates its local state to A thanks to read repair.

10 of 24

Created with Fabric.js 3.6.6
Client B receives the response to its read request, which has value A

11 of 24

Created with Fabric.js 3.6.6
The write request for setting owner value to A reaches the Replica 2, but the write operation failed at Replica 2, So the owner value remain the same

12 of 24

Created with Fabric.js 3.6.6
After the Client B’ read request, Client C also sends a read request for the owner to the replica nodes in int quorum

13 of 24

Created with Fabric.js 3.6.6
Replica 2 replies to to the Replica 1 that it failed to perform write operation

14 of 24

Created with Fabric.js 3.6.6
The read request from client C reaches replica 3, which will also ask replica 2 before replying to form a majority quorum

15 of 24

Created with Fabric.js 3.6.6
Replica 3 sends the read request to replica 2

16 of 24

Created with Fabric.js 3.6.6
Replica 1 returns a failure response to the Client A because the write operation was not successful in one of the replica nodes in Client A’s quorum

17 of 24

Created with Fabric.js 3.6.6
The read request from replica 3 reaches replica 2, which has none as value for owner

18 of 24

Created with Fabric.js 3.6.6
Replica 2 replies to the read request from replica 3 with none. Between none and A, replica 3 selects A as it has the highest timestamp.

19 of 24

Created with Fabric.js 3.6.6
Replica 3 sends a read_repair request to Replica 2

20 of 24

Created with Fabric.js 3.6.6
The read_repair request from Replica 3 reaches the Replica 2, so the Replica 2 updates the owner value from none to A

21 of 24

Created with Fabric.js 3.6.6
Replica 2 acknowleges the Replica 3 for the read repair

22 of 24

Created with Fabric.js 3.6.6
Replica 2 replies to the replica 3 with the value A even though the write operation was declared failed

23 of 24

Created with Fabric.js 3.6.6
Replica 2 forwards the value A to the Client C even though the write operation was declared failed. This violates linearizability.

24 of 24

Cassandra's Consistency Levels

Linearizability Guarantees by Cassandra